Accelerator Codesign as Non-Linear Optimization

نویسندگان

  • Nirmal Prajapati
  • Sanjay V. Rajopadhye
  • Hristo Djidjev
  • Nandkishore Santhi
  • Tobias Grosser
  • Rumen Andonov
چکیده

We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable GPGPUs. We first introduce a simple, analytical model for the silicon area usage of accelerator architectures and a workload characterization of stencil computations. We combine this characterization with a parametric execution time model and formulate a mathematical optimization problem. That problem seeks to maximize a common objective function of all the hardware and software parameters. The solution to this problem therefore “solves” the codesign problem: simultaneously choosing software-hardware parameters to optimize total performance. We validate this approach by proposing architectural variants of the NVIDIA Maxwell GTX-980 (respectively, Titan X) specifically tuned to a predetermined workload of four common 2D stencils (Heat, Jacobi, Laplacian, and Gradient) and two 3D ones (Heat and Laplacian). Our model predicts that performance would potentially improve by 28% (respectively, 33%) with simple tweaks to the hardware parameters such as adapting coarse and fine-grained parallelism by changing the number of streaming multiprocessors and the number of compute cores each contains. We propose a set of Pareto-optimal design points to exploit the trade-off between performance and silicon area and show that by additionally eliminating GPU caches, we can get a further 2-fold improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integer Programming for Partitioning in Software

This paper presents a new partitioning method for software oriented hard-ware/software codesign. It is applied to the use of eld{programmable accelerator boards. In the underlying model the dedicated hardware has no direct access to the host memory, and communication is slow. Therefore detailed data{{ow information is necessary to minimize the communication overhead between host and accelerator...

متن کامل

Medium-Term Stability of the Photon Beam Energy of An Elekta CompactTM Linear Accelerator Based on Daily Measurements of Beam Quality Factor

Introduction In this study, we aimed to assess the medium-term energy stability of a 6MV Elekta CompactTM linear accelerator. To the best of our knowledge, this is the first published article to evaluate this linear accelerator in terms of energy stability. As well as investigating the stability of the linear accelerator energy over a period of several weeks, the results will be useful for esti...

متن کامل

NUMERICAL OPTIMIZATION OF ACCELERATORS WITHIN oPAC

Powerful simulation tools are required for every accelerator and light source to study the motion of charged particles through electromagnetic fields during the accelerator design process, to optimize the performance of machine diagnostics and to assess beam stability and non-linear effects. The Optimization of Particle Accelerators (oPAC) Project is funded by the EU within the 7 Framework Prog...

متن کامل

Performance-Evaluation in Xputer-based Accelerators

The paper presents the performance analysis process within the parallelizing compilation environment CoDe-X for simultaneous programming of Xputer-based accelerators and their host. The paper introduces briefly its hardware/software codesign strategies at two levels of partitioning. CoDe-X performs both, at first level a profiling-driven host/accelerator partitioning for performance optimizatio...

متن کامل

A Novel Optimal Setting for Directional over Current Relay Coordination using Particle Swarm Optimization

Over Current Relays (OCRs) and Directional Over Current Relays (DOCRs) are widely used for the radial protection and ring sub transmission protection systems and for distribution systems. All previous work formulates the DOCR coordination problem either as a Non-Linear Programming (NLP) for TDS and Ip or as a Linear Programming (LP) for TDS using recently a social behavior (Particle Swarm Optim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1712.04892  شماره 

صفحات  -

تاریخ انتشار 2017